Goto

Collaborating Authors

 dom element


A data-driven approach for learning to control computers

arXiv.org Artificial Intelligence

It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. Instead of focusing on hand-designed curricula and specialized action spaces, we focus on developing a scalable method centered on reinforcement learning combined with behavioural priors informed by actual human-computer interactions. We achieve state-of-the-art and human-level mean performance across all tasks within the MiniWob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer. These results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would.


QWeb: Solving Web Navigation Problems using DQN

#artificialintelligence

We first formulate the MDP for our problem, M S,? QWeb solves the above problem using deep Q network(DQN) to generate Q values for each state and for each atomic action. The training process is almost the same as traditional DQN with the help of reward augmentation and some curriculum learning approaches, which we will discuss later. But for now let's first focus on the architecture of QWeb, which is essentially the most fruitful part of this algorithm. Encoding user instructions: As we've seen in the preliminaries, a user instruction consists of a list of fields, i.e.,key-value pairs K, V .


Learning to Navigate the Web

arXiv.org Machine Learning

Learning in environments with large state and action spaces, and sparse rewards, can hinder a Reinforcement Learning (RL) agent's learning through trial-and-error. For instance, following natural language instructions on the Web (such as booking a flight ticket) leads to RL settings where input vocabulary and number of actionable elements on a page can grow very large. Even though recent approaches improve the success rate on relatively simple environments with the help of human demonstrations to guide the exploration, they still fail in environments where the set of possible instructions can reach millions. We approach the aforementioned problems from a different perspective and propose guided RL approaches that can generate unbounded amount of experience for an agent to learn from. Instead of learning from a complicated instruction with a large vocabulary, we decompose it into multiple sub-instructions and schedule a curriculum in which an agent is tasked with a gradually increasing subset of these relatively easier sub-instructions. In addition, when the expert demonstrations are not available, we propose a novel meta-learning framework that generates new instruction following tasks and trains the agent more effectively. We train DQN, deep reinforcement learning agent, with Q-value function approximated with a novel QWeb neural network architecture on these smaller, synthetic instructions. We evaluate the ability of our agent to generalize to new instructions on World of Bits benchmark, on forms with up to 100 elements, supporting 14 million possible instructions. The QWeb agent outperforms the baseline without using any human demonstration achieving 100% success rate on several difficult environments.


Dealing with MNIST image data in Tensorflow.js

#artificialintelligence

There's the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data ... data cleaning is a much higher proportion of data science than an outsider would expect. Actually training models is typically a relatively small proportion (less than 10 percent) of what a machine learner or data scientist does. Manipulating data is a crucial step step for any machine learning problem. First the code imports tensorflow (make sure you're transpiling your code!), and establishes some constants, including: For the purposes of getting started, this article will only step through the load function. The Image object is a native DOM function that represents an image in memory, and it provides callbacks for when the image is loaded along with access to the image attributes.


How I rented a nice place to live using Elixir and a Facebook Messenger chat bot

@machinelearnbot

I decided to create Phoenix project as an umbrella project. An umbrella in Elixir is just a way of organizing a project into different standalone modules that depend on each other. This way, it is pretty straightforward to use parts of the project in other applications. It is also a very neat way of separating the components of your application into very organized, reusable and easy to understand modules. Phoenix is pretty cool and gives us a ready to use project with everything we need, from the basic functionality of a web application of receiving and responding to requests, database configuration, unit tests and some documentation.